Goto

Collaborating Authors

 provide feedback



Benchmarking the Robustness of Agentic Systems to Adversarially-Induced Harms

Nöther, Jonathan, Singla, Adish, Radanovic, Goran

arXiv.org Artificial Intelligence

Ensuring the safe use of agentic systems requires a thorough understanding of the range of malicious behaviors these systems may exhibit when under attack. In this paper, we evaluate the robustness of LLM-based agentic systems against attacks that aim to elicit harmful actions from agents. To this end, we propose a novel taxonomy of harms for agentic systems and a novel benchmark, BAD-ACTS, for studying the security of agentic systems with respect to a wide range of harmful actions. BAD-ACTS consists of 4 implementations of agentic systems in distinct application environments, as well as a dataset of 188 high-quality examples of harmful actions. This enables a comprehensive study of the robustness of agentic systems across a wide range of categories of harmful behaviors, available tools, and inter-agent communication structures. Using this benchmark, we analyze the robustness of agentic systems against an attacker that controls one of the agents in the system and aims to manipulate other agents to execute a harmful target action. Our results show that the attack has a high success rate, demonstrating that even a single adversarial agent within the system can have a significant impact on the security. This attack remains effective even when agents use a simple prompting-based defense strategy. However, we additionally propose a more effective defense based on message monitoring. We believe that this benchmark provides a diverse testbed for the security research of agentic systems. The benchmark can be found at github.com/JNoether/BAD-ACTS


A Potential Negative Societal Impacts

Neural Information Processing Systems

In addition, users may become overly dependent on the model's outputs For the feedback, we ask the person "Please consider the quality of the Given a score (1-5). 1 means its quality is bad, and 5 means its quality is very good". The interface of the user study is shown in Fig. A1. We report the average scores in Tab. We have a total of 1.1M training data in FIRE. In Fig. A2, we present the curves of A T, A TR, A TR, and RR using different Results show that more data leads to better performance.


Steered Generation via Gradient Descent on Sparse Features

Bhattacharyya, Sumanta, Rooshenas, Pedram

arXiv.org Artificial Intelligence

Large language models (LLMs) encode a diverse range of linguistic features within their latent representations, which can be harnessed to steer their output toward specific target characteristics. In this paper, we modify the internal structure of LLMs by training sparse autoencoders to learn a sparse representation of the query embedding, allowing precise control over the model's attention distribution. We demonstrate that manipulating this sparse representation effectively transforms the output toward different stylistic and cognitive targets. Specifically, in an educational setting, we show that the cognitive complexity of LLM-generated feedback can be systematically adjusted by modifying the encoded query representation at a specific layer. To achieve this, we guide the learned sparse embedding toward the representation of samples from the desired cognitive complexity level, using gradient-based optimization in the latent space.


SEFL: Harnessing Large Language Model Agents to Improve Educational Feedback Systems

Zhang, Mike, Dilling, Amalie Pernille, Gondelman, Léon, Lyngdorf, Niels Erik Ruan, Lindsay, Euan D., Bjerva, Johannes

arXiv.org Artificial Intelligence

Providing high-quality feedback is crucial for student success but is constrained by time, cost, and limited data availability. We introduce Synthetic Educational Feedback Loops (SEFL), a novel framework designed to deliver immediate, on-demand feedback at scale without relying on extensive, real-world student data. In SEFL, two large language models (LLMs) operate in teacher--student roles to simulate assignment completion and formative feedback, generating abundant synthetic pairs of student work and corresponding critiques. We then fine-tune smaller, more computationally efficient LLMs on these synthetic pairs, enabling them to replicate key features of high-quality, goal-oriented feedback. Unlike personalized tutoring approaches that offer multi-turn, individualized instruction, SEFL specifically focuses on replicating the teacher-->student feedback loop for diverse assignments. Through both LLM-as-a-judge and human evaluations, we demonstrate that SEFL-tuned models outperform their non-tuned counterparts in feedback quality, clarity, and timeliness. These findings reveal SEFL's potential to transform feedback processes for higher education and beyond, offering an ethical and scalable alternative to conventional manual feedback cycles.


Here's how ed-tech companies are pitching AI to teachers

MIT Technology Review

But this year, more and more educational technology companies are pitching schools on a different use of AI. Rather than scrambling to tamp down the use of it in the classroom, these companies are coaching teachers how to use AI tools to cut down on time they spend on tasks like grading, providing feedback to students, or planning lessons. One company, called Magic School, says its AI tools like quiz generators and text summarizers are used by 2.5 million educators. Khan Academy offers a digital tutor called Khanmigo, which it bills to teachers as "your free, AI-powered teaching assistant." Teachers can use it to assist students in subjects ranging from coding to humanities.


Continually Improving Extractive QA via Human Feedback

Gao, Ge, Chen, Hung-Ting, Artzi, Yoav, Choi, Eunsol

arXiv.org Artificial Intelligence

We study continually improving an extractive question answering (QA) system via human user feedback. We design and deploy an iterative approach, where information-seeking users ask questions, receive model-predicted answers, and provide feedback. We conduct experiments involving thousands of user interactions under diverse setups to broaden the understanding of learning from feedback over time. Our experiments show effective improvement from user feedback of extractive QA models over time across different data regimes, including significant potential for domain adaptation.


Smart tutor to provide feedback in programming courses

Roldán-Álvarez, David

arXiv.org Artificial Intelligence

Artificial Intelligence (AI) is becoming more and more popular as time passes, allowing to perform tasks that were difficult to do in the past. From predictions to customization, AI is being used in many areas, not being educational environments outside this situation. AI is being used in educational settings to customize contents or to provide personalized feedback to the students, among others. In this scenario, AI in programming teaching is something that still has to be explored, since in this area we usually find assessment tools that allow grading the students work, but we can not find many tools aimed towards providing feedback to the students in the process of creating their program. In this work we present an AI based intelligent tutor that answers students programming questions. The tool has been tested by university students at the URJC along a whole course. Even if the tool is still in its preliminary phase, it helped the students with their questions, providing accurate answers and examples. The students were able to use the intelligent tutor easily and they thought that it could be a useful tool to use in other courses.


An Ontology of Co-Creative AI Systems

Lin, Zhiyu, Riedl, Mark

arXiv.org Artificial Intelligence

The term co-creativity has been used to describe a wide variety of human-AI assemblages in which human and AI are both involved in a creative endeavor. In order to assist with disambiguating research efforts, we present an ontology of co-creative systems, focusing on how responsibilities are divided between human and AI system and the information exchanged between them. We extend Lubart's original ontology of creativity support tools with three new categories emphasizing artificial intelligence: computer-as-subcontractor, computer-as-critic, and computer-as-teammate, some of which have sub-categorizations.


How sure is sure? Incorporating human error into machine learning

AIHub

Human error and uncertainty are concepts that many artificial intelligence systems fail to grasp, particularly in systems where a human provides feedback to a machine learning model. Many of these systems are programmed to assume that humans are always certain and correct, but real-world decision-making includes occasional mistakes and uncertainty. Researchers from the University of Cambridge, along with The Alan Turing Institute, Princeton, and Google DeepMind, have been attempting to bridge the gap between human behaviour and machine learning, so that uncertainty can be more fully accounted for in AI applications where humans and machines are working together. This could help reduce risk and improve trust and reliability of these applications, especially where safety is critical, such as medical diagnosis. The team adapted a well-known image classification dataset so that humans could provide feedback and indicate their level of uncertainty when labelling a particular image.